where
$Y_{ijk}$ is the log transformed ridership level for origin-destination pair $i$, with orign $j$ and destination $k$
$\theta_0$ is the grand mean ridership level for all OD pairs
$b_{00j}$ is the random main effect of origin j (the contribution of origin $j$ averaged across all destinations)
$c_{00k}$ is the random main effect of destination k (the contribution of destination $k$ averaged across all origins)
$d_{0jk}$ is the random interaction effect of $i$ and $j$
$e_{ijk}$ is the random OD effect (i.e. random error associated with each OD pair)
The fully specified (combined) cross-classified model is given by Raudenbush and Bryk (2002, p. 384), as:
$$Y_{ijk} = \theta + \theta a_{ijk} +\beta_0X_k +\gamma_0W_j +\delta_0 X_k*W_j+\beta_1X_k*a_{ijk} + \gamma_1W_j*a_{ijk} + \delta X_k*W_j*a_{ijk} + b_{01j}X_k + c_{01k}W_j + b_{10j}a_{ijk} +c_{10k}a_{ijk} + d_{1jk}a_{ijk} + b_{11j}X_k*a_{ijk} + c_{11k}W_j*a_{ijk} + b_{00j} + c_{00k} + d_{0jk} + e_{ijk}$$
first load some utility/plotting packages for mixed models
lib <- c("lattice", "foreign", "nlme", "lme4", "car", "descr", "effects", "xtable", "texreg", "lmtest", "Hmisc", "ggplot2", "coefplot2", "MASS")
require(lib)
lapply(lib, require, character.only=T)
Loading required package: lib Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘lib’Loading required package: MASS
## re = object of class ranef.mer
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE) {
require(ggplot2)
f <- function(x) {
pv <- attr(x, "postVar")
cols <- 1:(dim(pv)[1])
se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
ord <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
pDf <- data.frame(y=unlist(x)[ord],
ci=1.96*se[ord],
nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
ind=gl(ncol(x), nrow(x), labels=names(x)))
if(QQ) { ## normal QQ-plot
p <- ggplot(pDf, aes(nQQ, y))
p <- p + facet_wrap(~ ind, scales="free")
p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
} else { ## caterpillar dotplot
p <- ggplot(pDf, aes(ID, y)) + coord_flip()
if(likeDotplot) { ## imitate dotplot() -> same scales for random effects
p <- p + facet_wrap(~ ind)
} else { ## different scales for random effects
p <- p + facet_grid(ind ~ ., scales="free_y")
}
p <- p + xlab("Levels") + ylab("Random effects")
}
p <- p + theme(legend.position="none")
p <- p + geom_hline(yintercept=0)
p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
p <- p + geom_point(aes(size=1.2), colour="blue")
return(p)
}
lapply(re, f)
}
wmata <- read.dta('Data/Collapsed_Example_0616_Hiro1.dta')
wmata <- subset(wmata, ridersum >1)
histogram(log(wmata$ridersum))
null_model <- gls( log(ridersum) ~ 1, data=wmata)
null_model_o <- lmer( log(ridersum) ~ 1+ (1 | mstn_id_o), data = wmata)
null_model_d <- lmer( log(ridersum) ~ 1+ (1 | mstn_id_d), data = wmata)
null_model_od <- lmer (log(ridersum) ~ 1 +(1| mstn_id_d) + (1| mstn_id_o), data = wmata)
first, we test whether each of the models with random intercepts are better than the fixed intercept model
lrtest(null_model, null_model_o)
lrtest(null_model, null_model_d)
lrtest(null_model_o, null_model_d)
Warning message: In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "gls", updated model is of class "lmerMod"
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 2 | -13746.16 | NA | NA | NA |
| 2 | 3 | -13311.89 | 1 | 868.5514 | 6.737529e-191 |
Warning message: In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "gls", updated model is of class "lmerMod"
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 2 | -13746.16 | NA | NA | NA |
| 2 | 3 | -11462.13 | 1 | 4568.071 | 0 |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 3 | -13311.89 | NA | NA | NA |
| 2 | 3 | -11462.13 | 0 | 3699.52 | 0 |
both models are significantly better than the fixed intercept model. Between the three models, the random intercept for destinations seems to explain the most variance.
So far, we know that there are station effects, and that destination effects are larger than origin effects
lrtest(null_model_o, null_model_od)
lrtest(null_model_d, null_model_od)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 3 | -13311.89 | NA | NA | NA |
| 2 | 4 | -10221.1 | 1 | 6181.568 | 0 |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 3 | -11462.13 | NA | NA | NA |
| 2 | 4 | -10221.1 | 1 | 2482.048 | 0 |
htmlreg(list(null_model_o,null_model_d, null_model_od), single.row =TRUE,type="html", custom.model.names = c("Random Origin", "Random Destination","Cross Classified"))
| Random Origin | Random Destination | Cross Classified | |
|---|---|---|---|
| (Intercept) | 5.08 (0.08)*** | 5.07 (0.14)*** | 5.05 (0.16)*** |
| AIC | 26629.78 | 22930.26 | 20450.21 |
| BIC | 26650.32 | 22950.80 | 20477.60 |
| Log Likelihood | -13311.89 | -11462.13 | -10221.10 |
| Num. obs. | 6961 | 6961 | 6961 |
| Num. groups: mstn_id_o | 84 | 84 | |
| Variance: mstn_id_o.(Intercept) | 0.45 | 0.50 | |
| Variance: Residual | 2.59 | 1.49 | 0.99 |
| Num. groups: mstn_id_d | 84 | 84 | |
| Variance: mstn_id_d.(Intercept) | 1.58 | 1.62 | |
| ***p < 0.001, **p < 0.01, *p < 0.05 | |||
likelihood ratio tests show that the cross classified model is significantly better than each of the random intercept models; it also has the smallest residual variance and AIC/BIC
so far so good
plot(null_model_od)
qqnorm(resid(null_model_od))
ggCaterpillar(ranef(null_model_od, condVar=TRUE))
Warning message: : ‘mode(antialias)’ differs between new and previous ==> NOT changing ‘antialias’
Warning message: : ‘mode(antialias)’ differs between new and previous ==> NOT changing ‘antialias’
Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
but FUN(X[[1]]) result is length 0
$mstn_id_d $mstn_id_o
the residuals ok. there are some groups clustered at the bottom, i.e. certain Os and Ds that have low levels of ridership. Apart from that, they look random
qqplots show the residuals and random effects are approximately normal. The lines aren't perfect, but I don't think we need to worry about it yet
the first model adds fixed effects for each OD pair (we keep the cross-classified specification)
model_1<- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log1p(travel_time) +
(1 | mstn_id_d)+ (1 | mstn_id_o),
data = wmata)
htmlreg(model_1, type="html", single.row =TRUE,custom.model.names = "Model 1")
lrtest(null_model_od, model_1)
| Model 1 | |
|---|---|
| (Intercept) | 9.71 (0.17)*** |
| log(peak_fare_05) | -1.62 (0.16)*** |
| log(off_peak_fare_05) | 2.28 (0.19)*** |
| log(track_mile) | 0.17 (0.12) |
| log(inv_comp_mile) | -1.13 (0.09)*** |
| log1p(travel_time) | -2.28 (0.05)*** |
| AIC | 16441.27 |
| BIC | 16502.90 |
| Log Likelihood | -8211.64 |
| Num. obs. | 6961 |
| Num. groups: mstn_id_d | 84 |
| Num. groups: mstn_id_o | 84 |
| Variance: mstn_id_d.(Intercept) | 1.45 |
| Variance: mstn_id_o.(Intercept) | 0.54 |
| Variance: Residual | 0.55 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 4 | -10221.1 | NA | NA | NA |
| 2 | 9 | -8211.636 | 5 | 4018.937 | 0 |
This is already a much better model than the null_od... we cut the residual variance in half
Model 2 adds more fixed effects to match Hiro's specification. Note that we still haven't centered any of the variables so the intercept terms don't really make sense and can't be interpreted. This is ok for the moment since we're just exploring some specifications, but eventually we'll need to center variables like peak_fare and track_miles that don't have a true zero. That way we can interpret the random intercepts as the group effects for OD pairs having the average fare and average trip duration, etc
model_2 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
(1 | mstn_id_o) + (1 | mstn_id_d),
data = wmata)
htmlreg(list(model_1, model_2), type="html",single.row =TRUE, custom.model.names = c("Model 1","Model 2"))
lrtest(model_1, model_2)
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 9.71 (0.17)*** | 1.53 (2.10) |
| log(peak_fare_05) | -1.62 (0.16)*** | -1.51 (0.15)*** |
| log(off_peak_fare_05) | 2.28 (0.19)*** | 1.68 (0.19)*** |
| log(track_mile) | 0.17 (0.12) | 0.12 (0.13) |
| log(inv_comp_mile) | -1.13 (0.09)*** | -0.84 (0.11)*** |
| log1p(travel_time) | -2.28 (0.05)*** | -1.65 (0.06)*** |
| log_tt_ratio2_fill | 0.29 (0.03)*** | |
| tt_ratio201 | -0.81 (0.09)*** | |
| log_parking_users | 0.27 (0.01)*** | |
| factor(redline_OD)1 | 0.53 (0.04)*** | |
| factor(m25_1)1 | 0.45 (0.14)** | |
| factor(m25_2)1 | 0.73 (0.29)* | |
| logjobshalfUp_D | 0.81 (0.12)*** | |
| log_tphpeakv2_D | 0.85 (0.33)** | |
| log_buslinescount_D | 0.19 (0.15) | |
| log_ParkingCapacityNew_O | 0.12 (0.02)*** | |
| log_hh_o | 0.26 (0.06)*** | |
| log_buslinescount_O | 0.16 (0.13) | |
| log(MedianHHIncome_O) | -0.09 (0.16) | |
| factor(GoodService_D)1 | -0.05 (0.25) | |
| AIC | 16441.27 | 15755.51 |
| BIC | 16502.90 | 15913.02 |
| Log Likelihood | -8211.64 | -7854.76 |
| Num. obs. | 6961 | 6961 |
| Num. groups: mstn_id_d | 84 | 84 |
| Num. groups: mstn_id_o | 84 | 84 |
| Variance: mstn_id_d.(Intercept) | 1.45 | 0.35 |
| Variance: mstn_id_o.(Intercept) | 0.54 | 0.34 |
| Variance: Residual | 0.55 | 0.50 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | ||
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 9 | -8211.636 | NA | NA | NA |
| 2 | 23 | -7854.756 | 14 | 713.7606 | 2.978478e-143 |
model 3 allows the coefficient for origin households to vary randomly (by origin)
model_3 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
(1 + log_hh_o| mstn_id_o) + (1 | mstn_id_d),
data = wmata)
htmlreg(model_3, type="html", single.row =TRUE,custom.model.names = "Model 3")
lrtest(model_2, model_3)
| Model 3 | |
|---|---|
| (Intercept) | 3.88 (1.93)* |
| log(peak_fare_05) | -1.50 (0.15)*** |
| log(off_peak_fare_05) | 1.68 (0.19)*** |
| log(track_mile) | 0.12 (0.13) |
| log(inv_comp_mile) | -0.84 (0.11)*** |
| log_tt_ratio2_fill | 0.28 (0.03)*** |
| tt_ratio201 | -0.81 (0.09)*** |
| log_parking_users | 0.27 (0.01)*** |
| factor(redline_OD)1 | 0.53 (0.04)*** |
| factor(m25_1)1 | 0.37 (0.13)** |
| factor(m25_2)1 | 0.56 (0.27)* |
| logjobshalfUp_D | 0.81 (0.12)*** |
| log_tphpeakv2_D | 0.88 (0.33)** |
| log_buslinescount_D | 0.21 (0.15) |
| log_ParkingCapacityNew_O | 0.09 (0.02)*** |
| log_hh_o | 0.35 (0.08)*** |
| log_buslinescount_O | 0.23 (0.12) |
| log1p(travel_time) | -1.65 (0.06)*** |
| log(MedianHHIncome_O) | -0.34 (0.14)* |
| factor(GoodService_D)1 | -0.06 (0.25) |
| AIC | 15753.69 |
| BIC | 15924.89 |
| Log Likelihood | -7851.85 |
| Num. obs. | 6961 |
| Num. groups: mstn_id_o | 84 |
| Num. groups: mstn_id_d | 84 |
| Variance: mstn_id_o.(Intercept) | 0.66 |
| Variance: mstn_id_o.log_hh_o | 0.09 |
| Variance: mstn_id_d.(Intercept) | 0.35 |
| Variance: Residual | 0.50 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 23 | -7854.756 | NA | NA | NA |
| 2 | 25 | -7851.845 | 2 | 5.820281 | 0.05446807 |
not a huge difference. lets try only a random coefficient for destination jobs
model_4 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
(1 | mstn_id_o) + (1 + logjobshalfUp_D | mstn_id_d),
data = wmata)
lrtest(model_2, model_4)
lrtest(model_3, model_4)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 23 | -7854.756 | NA | NA | NA |
| 2 | 25 | -7845.053 | 2 | 19.40467 | 6.11405e-05 |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7851.845 | NA | NA | NA |
| 2 | 25 | -7845.053 | 0 | 13.58439 | 0 |
the random jobs coefficient outperforms the random households coefficient
now a model with both of the random coefficients we tried
model_5 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) +
(1 + log_hh_o | mstn_id_o) + (1 + logjobshalfUp_D | mstn_id_d),
data = wmata)
lrtest(model_2, model_5)
lrtest(model_4, model_5)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 23 | -7854.756 | NA | NA | NA |
| 2 | 26 | -7841.505 | 3 | 26.50182 | 7.487352e-06 |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7845.053 | NA | NA | NA |
| 2 | 26 | -7841.505 | 1 | 7.097145 | 0.007720684 |
to make sure everything is still kosher
plot(model_5)
qqnorm(resid(model_5))
qqmath(ranef(model_5))
Warning message: : ‘mode(antialias)’ differs between new and previous ==> NOT changing ‘antialias’
Warning message: : ‘mode(antialias)’ differs between new and previous ==> NOT changing ‘antialias’
Warning message: : ‘mode(antialias)’ differs between new and previous ==> NOT changing ‘antialias’
Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
but FUN(X[[1]]) result is length 0
$mstn_id_o $mstn_id_d
htmlreg(list(null_model_od, model_1, model_2, model_3,model_4, model_5),single.row =TRUE, caption = "Model Comparison", custom.model.names =c("Null", "Reduced", "Full", "Random HH", "Random Jobs", "Dual Random"))
| Null | Reduced | Full | Random HH | Random Jobs | Dual Random | |
|---|---|---|---|---|---|---|
| (Intercept) | 5.05 (0.16)*** | 9.71 (0.17)*** | 1.53 (2.10) | 3.88 (1.93)* | 1.80 (1.95) | 3.82 (1.70)* |
| log(peak_fare_05) | -1.62 (0.16)*** | -1.51 (0.15)*** | -1.50 (0.15)*** | -1.55 (0.15)*** | -1.54 (0.15)*** | |
| log(off_peak_fare_05) | 2.28 (0.19)*** | 1.68 (0.19)*** | 1.68 (0.19)*** | 1.72 (0.19)*** | 1.72 (0.19)*** | |
| log(track_mile) | 0.17 (0.12) | 0.12 (0.13) | 0.12 (0.13) | 0.11 (0.13) | 0.11 (0.13) | |
| log(inv_comp_mile) | -1.13 (0.09)*** | -0.84 (0.11)*** | -0.84 (0.11)*** | -0.84 (0.11)*** | -0.85 (0.11)*** | |
| log1p(travel_time) | -2.28 (0.05)*** | -1.65 (0.06)*** | -1.65 (0.06)*** | -1.64 (0.06)*** | -1.64 (0.06)*** | |
| log_tt_ratio2_fill | 0.29 (0.03)*** | 0.28 (0.03)*** | 0.29 (0.03)*** | 0.29 (0.03)*** | ||
| tt_ratio201 | -0.81 (0.09)*** | -0.81 (0.09)*** | -0.83 (0.09)*** | -0.83 (0.09)*** | ||
| log_parking_users | 0.27 (0.01)*** | 0.27 (0.01)*** | 0.27 (0.01)*** | 0.27 (0.01)*** | ||
| factor(redline_OD)1 | 0.53 (0.04)*** | 0.53 (0.04)*** | 0.52 (0.04)*** | 0.52 (0.04)*** | ||
| factor(m25_1)1 | 0.45 (0.14)** | 0.37 (0.13)** | 0.42 (0.11)*** | 0.37 (0.11)*** | ||
| factor(m25_2)1 | 0.73 (0.29)* | 0.56 (0.27)* | 0.65 (0.23)** | 0.56 (0.22)* | ||
| logjobshalfUp_D | 0.81 (0.12)*** | 0.81 (0.12)*** | 1.06 (0.11)*** | 1.09 (0.10)*** | ||
| log_tphpeakv2_D | 0.85 (0.33)** | 0.88 (0.33)** | 0.64 (0.21)** | 0.77 (0.16)*** | ||
| log_buslinescount_D | 0.19 (0.15) | 0.21 (0.15) | 0.02 (0.10) | 0.03 (0.10) | ||
| log_ParkingCapacityNew_O | 0.12 (0.02)*** | 0.09 (0.02)*** | 0.12 (0.02)*** | 0.09 (0.02)*** | ||
| log_hh_o | 0.26 (0.06)*** | 0.35 (0.08)*** | 0.26 (0.06)*** | 0.35 (0.08)*** | ||
| log_buslinescount_O | 0.16 (0.13) | 0.23 (0.12) | 0.18 (0.13) | 0.23 (0.12)* | ||
| log(MedianHHIncome_O) | -0.09 (0.16) | -0.34 (0.14)* | -0.09 (0.16) | -0.34 (0.14)* | ||
| factor(GoodService_D)1 | -0.05 (0.25) | -0.06 (0.25) | 0.12 (0.16) | |||
| AIC | 20450.21 | 16441.27 | 15755.51 | 15753.69 | 15740.11 | 15735.01 |
| BIC | 20477.60 | 16502.90 | 15913.02 | 15924.89 | 15911.31 | 15913.06 |
| Log Likelihood | -10221.10 | -8211.64 | -7854.76 | -7851.85 | -7845.05 | -7841.50 |
| Num. obs. | 6961 | 6961 | 6961 | 6961 | 6961 | 6961 |
| Num. groups: mstn_id_d | 84 | 84 | 84 | 84 | 84 | 84 |
| Num. groups: mstn_id_o | 84 | 84 | 84 | 84 | 84 | 84 |
| Variance: mstn_id_d.(Intercept) | 1.62 | 1.45 | 0.35 | 0.35 | 3.21 | 3.23 |
| Variance: mstn_id_o.(Intercept) | 0.50 | 0.54 | 0.34 | 0.66 | 0.34 | 0.66 |
| Variance: Residual | 0.99 | 0.55 | 0.50 | 0.50 | 0.50 | 0.50 |
| Variance: mstn_id_o.log_hh_o | 0.09 | 0.09 | ||||
| Variance: mstn_id_d.logjobshalfUp_D | 0.12 | 0.13 | ||||
| ***p < 0.001, **p < 0.01, *p < 0.05 | ||||||
So far, the results show that we definitely want to include a cross-classified specification, and that both random intercepts are significant. We get much better likelihoods and fit criteria moving from the null model to model 2.
In models 3, 4, and 5, we add random coefficients, and although the likelihoods get a little better, they only get marginally better--and we stop explaining additional variance, the residual is stable at .50. In fact the likelihood ratio test between models 4 and 5 is just barely insignificant. This means that allowing coefficient for destination jobs to vary adds information, but also allowing origin households to vary does not add much.
Once we get the fixed effects specified well, I think we'll be in really good shape
model_6 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
(1 | mstn_id_o) + (1 + factor(GoodService_D) + logjobshalfUp_D | mstn_id_d),
data = wmata)
lrtest(model_4, model_6)
lrtest(model_4, model_6)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7845.053 | NA | NA | NA |
| 2 | 28 | -7841.278 | 3 | 7.549562 | 0.05629857 |
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7845.053 | NA | NA | NA |
| 2 | 28 | -7841.278 | 3 | 7.549562 | 0.05629857 |
model_7 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
(1 | mstn_id_o) + (1 | mstn_id_d),
data = wmata)
lrtest(model_4, model_7)
htmlreg(list(model_4, model_7), single.row=TRUE)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7845.053 | NA | NA | NA |
| 2 | 24 | -7854.747 | -1 | 19.38699 | 1.067314e-05 |
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 1.80 (1.95) | 3.06 (2.28) |
| log(peak_fare_05) | -1.55 (0.15)*** | -1.49 (0.15)*** |
| log(off_peak_fare_05) | 1.72 (0.19)*** | 1.67 (0.19)*** |
| log(track_mile) | 0.11 (0.13) | 0.13 (0.13) |
| log(inv_comp_mile) | -0.84 (0.11)*** | -0.84 (0.11)*** |
| log_tt_ratio2_fill | 0.29 (0.03)*** | 0.29 (0.03)*** |
| tt_ratio201 | -0.83 (0.09)*** | -0.81 (0.09)*** |
| log_parking_users | 0.27 (0.01)*** | 0.27 (0.01)*** |
| factor(redline_OD)1 | 0.52 (0.04)*** | 0.53 (0.04)*** |
| factor(m25_1)1 | 0.42 (0.11)*** | 0.43 (0.14)** |
| factor(m25_2)1 | 0.65 (0.23)** | 0.69 (0.29)* |
| logjobshalfUp_D | 1.06 (0.11)*** | 0.74 (0.13)*** |
| log_tphpeakv2_D | 0.64 (0.21)** | 0.52 (0.38) |
| log_buslinescount_D | 0.02 (0.10) | 0.22 (0.14) |
| log_ParkingCapacityNew_O | 0.12 (0.02)*** | 0.12 (0.02)*** |
| log_hh_o | 0.26 (0.06)*** | 0.26 (0.06)*** |
| log_buslinescount_O | 0.18 (0.13) | 0.17 (0.13) |
| log1p(travel_time) | -1.64 (0.06)*** | -1.66 (0.06)*** |
| log(MedianHHIncome_O) | -0.09 (0.16) | -0.09 (0.16) |
| factor(GoodService_D)1 | 0.12 (0.16) | 0.05 (0.25) |
| log(MilesfromCore_D) | -0.16 (0.10) | |
| AIC | 15740.11 | 15757.49 |
| BIC | 15911.31 | 15921.85 |
| Log Likelihood | -7845.05 | -7854.75 |
| Num. obs. | 6961 | 6961 |
| Num. groups: mstn_id_o | 84 | 84 |
| Num. groups: mstn_id_d | 84 | 84 |
| Variance: mstn_id_o.(Intercept) | 0.34 | 0.34 |
| Variance: mstn_id_d.(Intercept) | 3.21 | 0.34 |
| Variance: mstn_id_d.logjobshalfUp_D | 0.12 | |
| Variance: Residual | 0.50 | 0.50 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | ||
model_8 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
(1 | mstn_id_o) + (1 | mstn_id_d),
data = wmata)
lrtest(model_7, model_8)
htmlreg(list(model_7, model_8), single.row=TRUE)
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 24 | -7854.747 | NA | NA | NA |
| 2 | 24 | -7854.747 | 0 | 0 | 1 |
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 3.06 (2.28) | 3.06 (2.28) |
| log(peak_fare_05) | -1.49 (0.15)*** | -1.49 (0.15)*** |
| log(off_peak_fare_05) | 1.67 (0.19)*** | 1.67 (0.19)*** |
| log(track_mile) | 0.13 (0.13) | 0.13 (0.13) |
| log(inv_comp_mile) | -0.84 (0.11)*** | -0.84 (0.11)*** |
| log_tt_ratio2_fill | 0.29 (0.03)*** | 0.29 (0.03)*** |
| tt_ratio201 | -0.81 (0.09)*** | -0.81 (0.09)*** |
| log_parking_users | 0.27 (0.01)*** | 0.27 (0.01)*** |
| factor(redline_OD)1 | 0.53 (0.04)*** | 0.53 (0.04)*** |
| factor(m25_1)1 | 0.43 (0.14)** | 0.43 (0.14)** |
| factor(m25_2)1 | 0.69 (0.29)* | 0.69 (0.29)* |
| logjobshalfUp_D | 0.74 (0.13)*** | 0.74 (0.13)*** |
| log_tphpeakv2_D | 0.52 (0.38) | 0.52 (0.38) |
| log_buslinescount_D | 0.22 (0.14) | 0.22 (0.14) |
| log_ParkingCapacityNew_O | 0.12 (0.02)*** | 0.12 (0.02)*** |
| log_hh_o | 0.26 (0.06)*** | 0.26 (0.06)*** |
| log_buslinescount_O | 0.17 (0.13) | 0.17 (0.13) |
| log1p(travel_time) | -1.66 (0.06)*** | -1.66 (0.06)*** |
| log(MedianHHIncome_O) | -0.09 (0.16) | -0.09 (0.16) |
| factor(GoodService_D)1 | 0.05 (0.25) | 0.05 (0.25) |
| log(MilesfromCore_D) | -0.16 (0.10) | -0.16 (0.10) |
| AIC | 15757.49 | 15757.49 |
| BIC | 15921.85 | 15921.85 |
| Log Likelihood | -7854.75 | -7854.75 |
| Num. obs. | 6961 | 6961 |
| Num. groups: mstn_id_o | 84 | 84 |
| Num. groups: mstn_id_d | 84 | 84 |
| Variance: mstn_id_o.(Intercept) | 0.34 | 0.34 |
| Variance: mstn_id_d.(Intercept) | 0.34 | 0.34 |
| Variance: Residual | 0.50 | 0.50 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | ||
coefplot2(model_4)
residPlots(model_5)
Error in eval(expr, envir, enclos): could not find function "residPlots"
xyplot(residuals(model_4) ~ fitted(model_4),
panel=function(x, y){
panel.xyplot(x, y)
panel.loess(x, y, span = 0.75)
panel.lmline(x, y, lty = 2) # Least squares broken line
}
)
glm_model_1 <- glmer(ridersum ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
+ log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
(1 | mstn_id_o) + (1 | mstn_id_d), data = wmata, family = poisson)
Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio - Rescale variables?
plot(glm_model_1)
lrtest(model_4, glm_model_1)
Warning message: In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "lmerMod", updated model is of class "glmerMod"
| #Df | LogLik | Df | Chisq | Pr(>Chisq) | |
|---|---|---|---|---|---|
| 1 | 25 | -7845.053 | NA | NA | NA |
| 2 | 23 | -615949 | -2 | 1216208 | 0 |
htmlreg(list(model_4, glm_model_1), single.row=TRUE)
| Model 1 | Model 2 | |
|---|---|---|
| (Intercept) | 1.80 (1.95) | -2.50 (1.21)* |
| log(peak_fare_05) | -1.55 (0.15)*** | -1.29 (0.01)*** |
| log(off_peak_fare_05) | 1.72 (0.19)*** | 0.93 (0.01)*** |
| log(track_mile) | 0.11 (0.13) | 0.48 (0.01)*** |
| log(inv_comp_mile) | -0.84 (0.11)*** | -0.54 (0.01)*** |
| log_tt_ratio2_fill | 0.29 (0.03)*** | 0.16 (0.00)*** |
| tt_ratio201 | -0.83 (0.09)*** | -0.39 (0.01)*** |
| log_parking_users | 0.27 (0.01)*** | 0.26 (0.00)*** |
| factor(redline_OD)1 | 0.52 (0.04)*** | 0.90 (0.00)*** |
| factor(m25_1)1 | 0.42 (0.11)*** | 0.46 (0.13)*** |
| factor(m25_2)1 | 0.65 (0.23)** | 0.89 (0.25)*** |
| logjobshalfUp_D | 1.06 (0.11)*** | 0.64 (0.11)*** |
| log_tphpeakv2_D | 0.64 (0.21)** | 0.39 (0.30) |
| log_buslinescount_D | 0.02 (0.10) | 0.28 (0.13)* |
| log_ParkingCapacityNew_O | 0.12 (0.02)*** | 0.07 (0.02)*** |
| log_hh_o | 0.26 (0.06)*** | 0.20 (0.05)*** |
| log_buslinescount_O | 0.18 (0.13) | 0.20 (0.11) |
| log1p(travel_time) | -1.64 (0.06)*** | -1.38 (0.00)*** |
| log(MedianHHIncome_O) | -0.09 (0.16) | 0.46 (0.11)*** |
| factor(GoodService_D)1 | 0.12 (0.16) | -0.05 (0.21) |
| log(MilesfromCore_D) | -0.25 (0.08)** | |
| AIC | 15740.11 | 1231943.98 |
| BIC | 15911.31 | 1232101.49 |
| Log Likelihood | -7845.05 | -615948.99 |
| Num. obs. | 6961 | 6961 |
| Num. groups: mstn_id_o | 84 | 84 |
| Num. groups: mstn_id_d | 84 | 84 |
| Variance: mstn_id_o.(Intercept) | 0.34 | 0.26 |
| Variance: mstn_id_d.(Intercept) | 3.21 | 0.28 |
| Variance: mstn_id_d.logjobshalfUp_D | 0.12 | |
| Variance: Residual | 0.50 | 1.00 |
| ***p < 0.001, **p < 0.01, *p < 0.05 | ||